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Suppose that univariate data are drawn from a mixture of two 
distributions that are equal up to a shift parameter. Such a model is 
known to be nonidentifiable from a nonparametric viewpoint. How- 
ever, if we assume that the unknown mixed distribution is symmetric, 
we obtain the identifiability of this model, which is then defined by 
four unknown parameters: the mixing proportion, two location pa- 
rameters and the cumulative distribution function of the symmetric 
mixed distribution. We propose estimators for these four parame- 
ters when no training data is available. Our estimators are shown to 
be strongly consistent under mild regularity assumptions and their 
convergence rates are studied. Their finite-sample properties are il- 
lustrated by a Monte Carlo study and our method is applied to real 
data. 

1. Introduction. Cumulative distribution functions (c.d.f.) of p-variate 
multi-component mixture models are generally defined by 

k 

(1) G(x) =yjA j F i (x), x£R p , 

j=i 

where the unknown mixture proportions Xj (Xj > and Y^j=i^j = 1) an d 
the unknown c.d.f. Fj are to be estimated. It is commonly assumed that the 
Fj 's belong to a parametric family, which means that the space of unknown 
parameters is reduced to a Euclidean set, leading to parametric inference. 
There is an extensive literature on this subject, including the monographs 
of Everitt and Hand [16], Titterington, Smith and Makov [40], McLachlan 



Received November 2004; revised October 2005. 

AMS 2000 subject classifications. Primary 62G05, 62G20; secondary 62E10. 
Key words and phrases. Semiparametric, two-component mixture model, identifiabil- 
ity, contrast estimators, consistency, rate of convergence, mixing operator. 

This is an electronic reprint of the original article published by the 
Institute of Mathematical Statistics in The Annals of Statistics, 
2006, Vol. 34, No. 3, 1204-1232. This reprint differs from the original in 
pagination and typographic detail. 



1 



2 



L. BORDES, S. MOTTELET AND P. VANDEKERKHOVE 



and Basford [28] and McLachlan and Peel [29]. The main types of estima- 
tors that have been proposed are the following: maximum likelihood (e.g., 
[7, 24, 25, 35]), minimum chi-square (e.g., [11]), method of moments (e.g., 
[26]), Bayesian approaches (e.g., [13, 15]) and techniques based on moment 
generating functions (e.g., [34]). Note that the number of components k in 
model (1) may also be an unknown parameter to be estimated, leading to 
various rates of convergence for maximum likelihood estimators, as discussed 
by Chen [6]. In this case, the selection of a model is an important topic; see, 
for example, [10, 22, 23]. 

The choice of a parametric family for the i*Vs may be difficult when little 
is known about subpopulations. However, models of type (1) are generally 
nonparametrically nonidentifiable without additional assumptions. This is 
no longer true when training data are available, that is, when some data 
are of known origin with respect to the components of the mixture distribu- 
tion. In this case nonparametric techniques can be applied; see, for example, 
[4, 17, 21, 31, 33, 36, 39, 40]. As Hall and Zhou [18] state, "very little is 
known about nonparametric inference in mixtures without training data." 
These authors looked at p-variate data drawn from a mixture of two distri- 
butions, each having independent components, and proved that, under mild 
regularity assumptions, their model is identifiable for p > 3. They proposed 
root-n consistent estimators of the 2p univariate marginal distributions and 
the mixing proportion. In a working paper Kitamura [20] investigates iden- 
tifiability of type (1) models with the presence of covariates. 

Note that even if model (1) is not nonparametrically identifiable, there 
exist, for p = 1 and k = 2, many real data sets in the statistical literature for 
which such a model is used under parametric assumptions for the Ffs. For 
example, Azzalini and Bowman [1] provided data on the length of intervals 
between eruptions and the duration of the eruption for the Old Faithful 
Geyser in Yellowstone National Park. Another example deals with average 
amounts of precipitation (rainfall) in inches for United States cities (from the 
Statistical Abstract of the United States, 1975; see [30]). These two data sets 
are available in the R statistical package. Moreover, in some studies, the only 
parameters of interest are mixture proportions, in which case components 
Fj in model (1) are nuisance parameters (see, e.g., [8]). In this paper we 
consider the two-component identifiable restriction of model (1) defined by 



Unknown parameters are the c.d.f. F of a symmetric distribution, two real 
location parameters (i\ and (12 and the mixing proportion A. Note that this 
model has also been studied by Hunter, Wang and Hettmansperger [19] in 
an independent work. Model (2) above is called semiparametric inasmuch 
as the unknown parameters can be separated into a functional part F and a 




G(x) = XF(x - m) + (1 - \)F(x - (i 2 ) 



xeM. 
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Euclidean part (^1,^2, A). Note that such a model should be distinguished 
from the so-called semi- or nonparametric mixture models (e.g., [27]) where 
G is defined by 

(3) G(x)= f F(x;9)dH(9), xeR, 

where F belongs to a parametric family and H is an unknown distribution 
function on R. However, as noted by Lindsay and Lesperance [27], there is a 
link between models (1) and (3) if H is discrete with k points of support. Of 
course, such a link exists between models (2) and (3) by assuming that, in the 
latter model, F(-;9) = F(- — 9) with F in the c.d.f. family T of symmetric 
distributions, and that H puts masses A and 1 — A at points (Mi and H2, 
respectively. 

One of the fundamental issues with mixture models of type (1) is to pro- 
vide identifiability results. When the -Fj's belong to certain specific paramet- 
ric families (e.g., the continuous exponential family), identifiability results 
are available; see, for example, [2, 5, 38]. More is needed when we aim to 
estimate the i^-'s nonparametrically (see [18] for the two components case). 
Working with model (2), we need to prove that G is defined by a unique 
quadruple (A, /ii, fi2, F). 

The paper is organized as follows. In the next section we give an iden- 
tifiability result for model (2). In Section 3 we provide a methodology for 
estimating unknown parameters in our two-component mixture model. Con- 
sistency results and convergence rates of our estimators are given in the same 
section. Our main results are proved in Section 5. In Section 4 finite-sample 
properties of our estimators are illustrated by a Monte Carlo study and our 
method is applied to precipitation data. 

2. Identifiability. First, note that if F in model (2) admits a density 
function / (an even function), the mixture distribution admits a density 
function g defined by 

(4) g(x)=\f(x- f jL 1 ) + (l-\)f{x-ti 2 ), xGR, 

where 9 = (A, m, // 2 ) G 9 = [0,1/2) x (R 2 \A) and A = {(x,x);x G R}. 

The aim of this section is to investigate identifiability, that is, the possi- 
bility of having 

XF{x-fi 1 ) + (l-X)F(x- f x 2 ) 
{> =X'F'{x-n' 1 ) + (l-X')F'{x-fi' 2 ) VxGR, 

for two different quadruples (9, F) and (9', F') in O x J 7 , where 9' = (A', /jl^, /x 2 ) 
and T is the c.d.f. set of symmetric distributions. Note that it is sufficient 
to consider A G [0,1/2) because the model is invariant by permutation of 
(A,/xi) and (1 — A, Note also that what we mean by identifiability is 
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not entirely an injectivity condition, since if A = 0, we only need to obtain 
A' = 0, fj,2 = fJ>2 and F = F' . Clearly, identiflability fails if we allow A to be 
equal to 1/2. Indeed, suppose that / is itself an even mixture density func- 
tion, for example, fix) = h(x — (i)/2 + h(x + /i)/2 with h an even density 
function. If g(x) = f(x — H2) with A = 0, then (5) is obviously satisfied with 
A' = 1/2, Hi = /i + /i2, fJ>2 = — M and /' = h. The main identiflability result 
is summarized in the following theorem. 

Theorem 2.1. If (X,fii,H2,F) and (X' , fjf 1} fj,' 2 , F') are two parameters 
of [0, 1/2) x (M 2 \A) x T satisfying (5), then A = A', /i 2 = M2 and F = F '> 
and fii = fi'i if X> 0. 

Hunter, Wang and Hettmansperger [19] have established a similar result 
for the parametric part (X,ni,H2) of the model. Their results are slightly 
different from ours since they considered identiflability from the injectivity 
point of view. They also gave a necessary condition for identifying a type 
(2) model with three components. 

A question which naturally arises concerns the possibility of extending 
our identiflability result when scale parameters are introduced into model 
(2). In fact, it is easy to show that such a model is generally not identifiable. 

3. Methodology and theoretical results. Let Xi, . . . ,X n be n indepen- 
dent and identically distributed random variables with common c.d.f. G 
given by model (2). We shall denote by #0 an d Fq the true values of the 
unknown Euclidean parameter and the unknown mixed c.d.f. The aim of 
this section is to propose estimators for #0 an d Fq. Asymptotic results are 
given with respect to n — > +00. 

The first key idea developed in Section 3.1 is based on the possibility of 
expressing F as a function of G and 9 (resp. / as a function of g and 6) by 
inverting the relation (2) [resp. by inverting the relation (4)]. The second key 
idea, developed in Section 3.2, involves using the symmetry property of Fq 
in order to propose a contrast function for the Euclidean parameter 9 when 
G is known. Then, in Sections 3.3 and 3.4, replacing G by the corresponding 
empirical c.d.f., we propose estimators of #0 an d i^o an d give some asymptotic 
results for these estimates. These results are obtained under two kinds of 
assumptions on Fq: 

CI. Fq is strictly increasing and Lipschitz on M. 

C2. Fq is strictly increasing, twice continuously differentiable on R and Fq £ 
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3.1. Inversion formula. Assume that in the mixture model defined by 
(2) the Euclidean parameter 9 = (A, fii, ^2), with fi\ 7^ ^2 and A G [0, 1/2), is 
known. The key idea consists in rewriting (2) as 

(6) F(x) = -^—G(x + fi 2 ) + ^ T F(x + r]) VxGR, 

1 — A 1 — A 

where rj = fj>2 — yUi 7^ 0, and hence, using (6) as a recurrence formula. Let I 
be a positive integer. By using (6) t times, we get 

1 -A ^ 



F ^ = 1 X 51 ( 1 X ) G^ + ^ + fcr?) 

(7) 1_A fc=o V 

- A 



+ (j — -j F(i + f?)) Vi£R. 
Let us show that 

(8) j p( 2 ;) = _l_^^_^_^ G ( x + Ai2 + fcr / ) Vxel. 

If we denote by H the right-hand side in (8), then by (7) we get, for all 
£>1, 



l-Xj V 1-2A7' 

where || • ||oo denotes the supremum norm. Since the right-hand side of the 
above inequality can be made arbitrarily small, it follows that F = H. Sim- 
ilarly, working with densities [see (4)] and replacing the supremum norm by 
the L 1 (M)-norm || • ||i, we get 

1 v-/ -A xfc 



(9) f(x) = I ) g(x + fi2 + kr]) for /i-almost all 

where /i is Lebesgue measure on M. 

At this point it is convenient to introduce the linear bounded operators 
Aq and Aq 1 defined by 

1 ( — A \ k 

(10) A e = Ar w + (1 - A)^ and Ag 1 = j—^- (^JT^) r -M2-fc»?> 

where (/i, S M) is the shift operator defined by r M / = /(• — /u). With the 
above definitions of Aq and Ag , formulae (2) and (4) are equivalent to G = 
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AqF and g = Agf, respectively, whereas formulae (8) and (9) are equivalent 
to F = A e ~ 1 G and / = A e ~ 1 g, respectively. 

The interest of the operator A7 1 is that if 9 is known, the c.d.f. F may 
be recovered from a nonparametric estimate G of G by considering the 
reversed estimates F = Aq 1 G. This also holds for the density /, defining / = 
A^g with g a nonparametric estimator of g. Unfortunately, the Euclidean 
parameter 9 is generally unknown and thus we need to propose an estimate 
of 9 separately. It should be noted that the above inversion formulae do 
not require the model to be identifiable. We saw in Section 2 that a crucial 
factor in obtaining identifiability is using the symmetry of the unknown 
mixed distribution. In the next paragraph we use the symmetry of the mixed 
distribution to provide a contrast function. 

3.2. A contrast function. The second key point follows from the following 
simple remark. Let Fg = Aq 1 G = A g ~ 1 Ag Q F , where 9 E 0. Clearly, if 9 = 9 , 
we have Fq = Fq (from Section 3.1), and it must have the invariance property 
of c.d.f.'s of symmetric distributions, Fq(x) = 1 — Fo(— x). For simplicity, let 
us introduce S r , the symmetry operator defined by S r F(-) = 1 — F( — ). The 
preceding remark may be reformulated as follows: if 9 = 9$, then A 9 ~ 1 G = 
S r Ag 1 G or, equivalently, G = AgS r A e ~ 1 G, by applying Ag on the left-hand 
side of the last equality. What about the converse? The answer is given in 
the following theorem, whose proof is given in Section 5. 

Theorem 3.1. Consider model (2) with Fq the c.d.f. of a symmetric 
distribution and 9$ 6 ©. //, for 9 £ 0, we have G = AgS r A g ~ 1 G, then 9 = 9q. 

Assuming that G is known, we can recover the true value 9q of 9 by 
minimizing a discrepancy measure between G and Gg = AgS r Ag l G . Recall 
that G is unknown but can be estimated, which is why we choose to consider 
the discrepancy measure K, defined by 

(11) K(9) = K(9;G)= f (Gg(x)-G(x)) 2 dG(x), 6>G0. 

Jr 

The choice of introducing the weighted measure G in the above integral 
follows from the consideration that if G is replaced by its empirical c.d.f., 
then the integral sign turns into a simple sum. As a consequence of the 
preceding theorem, assuming that F is sufficiently smooth and that G is 
known, we are able to show that K is a contrast function for the unknown 
Euclidean parameter 9. 

Corollary 3.1. Under assumption CI, K is a contrast function: for 
all 9£Q, K(9) > and K(9) = if and only if9 = 9 . 
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3.3. Estimators of the Euclidean parameter 9. The above Corollary 3.1 
suggests that the unknown Euclidean parameter 9 should be estimated as 
follows: 

9 n = argminif (0;G n ), 
6>ee 

where G n is an estimator of the c.d.f. of G. It is important to note that if G n 
is a stepwise function, K(6; G n ) is also a stepwise function with respect to 
parameters /ii and (J,2, and does not have the required regularity properties 
for differentiable optimization techniques to be applied in order to find 9 n . 
This is the reason why we need to distinguish two cases: (PI) the parameters 
Hi and \i2 are known and (P2) the parameters [i\ and /i2 are unknown. 

(PI) The parameters [X\ and ^2 are known, whereas A and F are unknown. 

For this problem, we suppose that the true mixing proportion Ao belongs 
to [0, 1/2 — d], where d G (0, 1/2). In this case the parameter 9 reduces to A 
and we estimate A by 

A n = argmin K(X;G n ), 
\e[o,i/2-d\ 

where G n is the empirical c.d.f. of G defined by 

1 11 

(12) G n {x) = -J2lx l<x V^R, 



where 1 denotes the indicator function. Let us give an explicit formula for 
*\ = A\S r A^ ] 

lA/ A , 1 — 2A / A \ L & x h 

^VA^T, 



=A x S r A x l G n involving a sum of n terms: 

\ n / \ x 2A / 

(13) G A (x) = l + -£(— l x < JJ+2Ml _ J r i + _^_f. 

i=i 
where 

2/*i + Xi - v 



L(i,x) = max! 1 



V 



and where \x] denotes the smallest integer greater than or equal to x and 
rj = [i2 — fJ>x- The following theorem, whose proof is provided in Section 5, 
gives the asymptotic behavior of A re . 



Theorem 3.2. Assume that the c.d.f. Fq satisfies assumption CI. Then 
(i) A n converges almost surely to Ao, and (ii) we have \/n(A n — Ao) = Op(l). 



<s 
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Note that if Fq is assumed to admit a first-order moment, then, using the 
first-order moment equation of g, we show that Ao can be directly estimated 
by the natural empirical estimator 

v _ n- i Er=i^-^ 2 

An — j 

M2 - Ml 

which obviously satisfies results of the above theorem. 

(P2) The parameters /ii,//2,A and F are unknown. 

For this problem, we suppose that = [0, 1/2 — d] X X, where < d < 1/2 
and X is a compact subset of M 2 such that X H A = , and the unknown 
Euclidean parameter 9 is an interior point of G. As explained previously, we 
need to change K(-;G n ) into the more regular version K r (-;G n ) defined by 



K r {e-G n ) = / (Gp(x)-G n (x)YdG n (x), 

JR 

where G^ n) = AgSrAj 1 ^ and G n (x) = f g n (y) dy, with 

J — oo 

9n{x) = ^- [ q(—r-^-) dG n(y), 

On JR \ n J 

where (b n )n>i is a sequence of real numbers decreasing to 0. Our numerical 
applications are based upon the kernel function q defined by q(x) = (1 — 

M)i|z|<i- 

As for the (PI) problem, we prove in Section 5 asymptotic results sum- 
marized in the next theorem for the estimator 9 n . From a general point of 
view, G n is a smooth estimate of the c.d.f. G defined, for x € M, by 

(14) G„ W = Ig (^), 



fc=l 

I'.'' 



where Q(x) = JUi^qiy) dy, with q an even density function with compact 
support and second-order moment equal to 1, and (6 n )„>i is a sequence of 
nonnegative real numbers decreasing to with nb n — ► +oo and \fnb\ = 0(1). 

~ (n) 

The fact that q has compact support leads to an explicit formula for G e , 
involving a sum of n terms, 

Ge{x) = 1 + w £1 —A — k — 



1-2A/ A \ L ^' X ) 

+ 



A VA- 1 

(15) 
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' k=L\(i,x) 



A(A 



A 



xQ 



A- 1 

■a + (fe + l)r? + 2^ti - jCj 



where, for k = 1,2, 

Lk(i, x) = max! 1, 



x-2/i 1 +X i -7 ? + (-l) fe 6, 



Theorem 3.3. If the c.d.f. Fq satisfies CI, then 9 n converges almost 
surely to 6q. If, in addition, Fq satisfies C2, we have n l l i ~ a (Q n — Oq) = 
Oa.s.(l), for all a > 0. 



3.4. Estimators of functional parameter F . As suggested by the inver- 
sion formula (8), once we get a consistent estimator n of the unknown (or 
partially unknown) Euclidean true parameter 6q, it is natural to seek to 
approximate the unknown c.d.f. Fq by F n = A^Gn,. However, since we ap- 

proximate the c.d.f. of a symmetric distribution, we constrain F n to satisfy 
the invariance property F n = S r F n , leading to the final estimator 

(16) F n = ±(I + S r )A7 1 G n , 

where / is the identity operator. By similar arguments, the unknown density 
function /o can in turn be estimated by 

(17) f n = \{I + S d )A-^g n , 

where the operator Sd is defined by (Sdf)(x) = f(—x) (corresponding to 
the invariance property of densities of symmetric distributions). The next 
theorem gives asymptotic results for both F n and f n for problems (PI) and 
(P2). These theorems are proved in Section 5. 



Theorem 3.4. (i) If F satisfies CI, then \\F n -F \\ oo converges almost 
surely to for problems (PI) and (P2). 

(ii) Under CI, we have \\F n — -Folloo = Op{n~ 1 / 2 ) for problem (PI). Un- 
der C2, for problem (P2) we have \\F n — -Fo||oo = o a . s .(ra~ 1 / 4+a ) for any 
a>0. 

(in) Under CI (resp. C2), for problem (PI) [resp. (P2)], \\f n — fo\\i con- 
verges almost surely to 0. 
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Let us notice that generally F n (resp. f n ) is not a c.d.f. function (resp. a 
density function). Indeed, by the definition of a mixture, g belongs to the 
range of the operator Aq , whereas this is no longer true for its approximate 
g n . Since there is no possibility that g n is a two-component mixture in the 
sense of model (2), it follows that AT 1 ^ cannot be a density function, 

and the same holds for f n . However, from a practical point of view, we 
can easily transform estimators f n into density functions. Let us consider 
/* = / n l^> . It is straightforward to show that ||/* - / ||i < - /o||i, 
and then we have the almost sure convergence of \\f* — /o||i to 0, given the 
assumptions of Theorem 3.4(iii). Moreover, under the same assumptions and 
with s n = J^f*(x)dx, we have 



II 



< 
< 



(f*(x)-f (x))dx 

', — fo\\i 
, — /bill 







a.s. 



Therefore, f n = s n l fn are density functions that satisfy ||/ n 
almost surely. 



roll l 



0. 



3.5. Discussion of the three- component case. As we discussed in Section 
2, identifiability results exist (see [19]) for the following three-component 
model: 

G{x) = AiF(x - m) + X 2 F(x - fj, 2 ) + A 3 F(x - /i 3 ) Viel, 

where F is the c.d.f. of a symmetric distribution, and the Aj's are nonnegative 
real numbers with Ai + A2 + A3 = 1. A question naturally arises concerning 
the possibility of extending our estimation method to the above model. 
Following the method presented in Section 3.1, we get, for all £ > 1, 



+ W k ) 
VxGlR, 

where we suppose that A3 > max(Ai,A2) and we denote rji = /13 — m for 
i = 1,2. To prove that a type (8) formula exists, we need to show that, for 



F(x) 



G{x + ^) 
A3 

+ E(-i) fc E 



A^ • • • \i k 



G(x + /1 3 + 7] h + 



k=l (i u ...,i k )e{i,2}>< ' v 3 

(-1) £ E ^-^F(x + Vll +-.. + TH,) 
(ii,...,i/)e{l,2y 3 



SEMIPARAMETRIC MIXTURE MODEL 



11 



all x £ R, we have 
F(x) = 

A3 

(18) +°° \ . 

+ E(-!) fc E ^7^G{x + ^ + mi +... + mk )- 

k=l (ii,...,i fe )G{l,2}fe A 3 

Unfortunately, taking x > 1, it is easy to see that (18) is not satisfied by 
taking, for example, Ai = A2 = 4/15, A3 = 7/15, [i\ = 0, [J,2 = —1, fJ>3 = 1 and 
F the c.d.f. of the uniform distribution on (—1, 1). Note, however, that if the 
inversion formula (18) is valid [this is the case, e.g., for 2max(Ai,A2) < A3], 
the methodology proposed in this section for the two-component case may 
be applied. 

4. Numerical study. We consider two distinct problems. The first is to 
estimate A given that [i\ and ^2 ar e known. In this case we use an explicit 
formula for G^ 1 . In the second case we estimate 9 = (A, 1^1,^2) an d we con- 
sider Gri , the regularized version of Gq ■ Explicit formulae for G^ and 

Gg 1 ^ are given in (13) and (15). Recall that the computation of G^ involves 
the choice of a bandwidth b n . All the simulation results have been obtained 
with b n = n" 1 / 4 . This value is not optimal to estimate the density g but it is 
compatible with the assumption \fnb\ = 0(1) needed to achieve the conver- 
gence rate given in Theorem 3.3. Note that in all our simulations the variance 
<7g under g is close to 1; our choice for b n is then close to the bandwidth 
that minimizes the mean integrated squared error, usually approximated by 
<7 9 (4/3n) 1//5 (see, e.g., [3]). It is known to be a good approximation for nor- 
mal data and a Gaussian kernel but we cannot insure that it leads to an 
optimal choice for our problem. For the real example of rainfall data given 
at the end of this section, we used the bandwidth (b n = 3.84) provided by 
the R software. 

Choice of optimization method. Problem (PI) attempts to find an esti- 
mate A n of A when \i\ and \i2 are known, 

(19) A n = argmin K{\;G n ). 

Ae[0,l/2-d] 

Problem (P2) attempts to find an estimate 6 n of 9 = (A, 112), 

(20) 9 n = aigmmK r (9;G n ). 

<?ee 

Both problems require the minimization of a differentiable functional. As far 
as problem (19) is concerned, numerical experiments indicate that K(-;G n ) 
is strictly convex in [0,1/2 — d] and, thus, an unconstrained minimization 
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Table 1 

Empirical mean and standard error (in brackets) of X 
estimates, obtained from 500 simulations of i.i.d. 
samples of size n, for the (PI) problem with 





Mi = - 


-1 and [1,2=1 




\ x 

n \ 


0.15 


0.25 


0.35 


100 


0.151 (0.058) 


0.256 (0.060) 


0.347 (0.057) 


400 


0.148 (0.031) 


0.252 (0.032) 


0.349 (0.029) 



algorithm can safely be used, with a starting point in this interval. We use 
the quasi-Newton BFGS (Broyden, Fletcher, Goldfarb and Shanno) method 
(see, e.g., [32]). In the second case, some experiments with the same un- 
constrained method show that K r (-;G n ) is not convex, and that K r (-;G n ) 
has local minima not belonging to O. So we use the constrained version of 
the BFGS algorithm, where bounds on the variables can be taken into ac- 
count. In both cases, we provide the gradient of the functional, which can 
be readily computed from the explicit formulae given in Section 3.3. All the 
computations are performed with Scilab. 

Numerical result of the Monte Carlo study for Gaussian mixtures. In 
this section we denote by M(n,a 2 ) a Gaussian distribution with mean \x 
and variance a 2 . The performance of our method is evaluated, via a Monte 
Carlo study, on the Gaussian mixture 

A*AA(/x 1 ,l) + (l-A)*AA(^ 2 ,l), 

for the (PI) problem (see Table 1) and in the (P2) problem (see Tables 
2 and 3). More precisely, Table 1 summarizes the performance of our method 
for different values of A, that is, A = 0.15 (weakly bumped model), A = 0.25 
(moderately bumped model) and A = 0.35 (strongly bumped model), when 

Table 2 

Empirical mean and standard error of (A, /ii, ^2) semiparametric 
estimates, obtained from 200 simulations of i.i.d. samples of 
size n, for the (P2) problem with b n = n _1//4 



n 


(A,/xi, 




Empirical means 


Standard errors 


100 


(0.15,- 


■1,2) 


(0.161, 


-0.948,2.030) 


(0.052,0.365,0.137) 


200 


(0.15,- 


■1,2) 


(0.157, 


-1.027,2.023) 


(0.035,0.283,0.101) 


100 


(0.25,- 


-1,2) 


(0.249, 


-1.011,2.009) 


(0.060,0.289,0.154) 


200 


(0.25,- 


-1,2) 


(0.251, 


-1.000,2.010) 


(0.041,0.195,0.101) 


100 


(0.35, - 


-1,2) 


(0.347, 


-0.988,1.990) 


(0.056,0.230,0.145) 


200 


(0.35, - 


-1,2) 


(0.357, 


-0.976,2.012) 


(0.046,0.176,0.114) 



SEMIPARAMETRIC MIXTURE MODEL 



13 



Table 3 

Empirical mean and standard error of (X, [11,(12) maximum 
likelihood estimates, obtained from 200 simulations of i.i.d. 
samples of size n, for the (P2) problem with b„ =n~ 1 ^ 4 



n 




.M2) 


Empirical means 


Standard errors 


100 


(0.15,- 


-1,2) 


(0.163, 


-0.987,2.018) 


(0.054,0.431,0.138) 


200 


(0.15,- 


-1,2) 


(0.152, 


-1.013,2.004) 


(0.035,0.283,0.089) 


100 


(0.25, - 


-1,2) 


(0.256, 


-1.008,2.020) 


(0.051,0.268,0.132) 


200 


(0.25, - 


-1,2) 


(0.247, 


-1.003,2.004) 


(0.046,0.204,0.114) 


100 


(0.35, - 


-1,2) 


(0.342, 


-1.041,1.980) 


(0.054,0.231,0.161) 


200 


(0.35, - 


-1,2) 


(0.345, 


-1.009,1.991) 


(0.041,0.159,0.111) 



/ii = — 1 and [12 = 2 are known. Table 2 summarizes the performance of our 
method in estimating A = 0.15,0.25,0.35, and fi\ = —1, ^2 = 2, while Table 
3 gives the performance of the standard maximum likelihood approach in 
the same framework. 

Comments on Tables 1-3. The results in Table 1 show first that em- 
pirical bias amounts to less than 1% of the true values, and that standard 
errors are reasonably small. In order to clarify the analysis of the results 
given in Table 1 and to quantify the influence of bumps on the estimation 
efficiency, we can normalize the empirical standard errors with respect to 
the true values of the parameters (std/A). We obtain for A = 0.15, 0.25, 0.35, 
normalized empirical standard errors equal to 0.386, 0.240, 0.162, respec- 
tively, for n = 100, and equal to 0.206, 0.128, 0.0828, for n = 400. These 
indicators show, roughly speaking, that our estimation method is around 
2.4 times more precise when A = 0.35, in comparison with the case where 
A = 0.15, and 1.6 times more precise when A = 0.35 in comparison with the 
case where A = 0.25. This shows that the nonnegligibility of one subpopu- 
lation with respect to the other subpopulation improves the quality of the 
estimators. 

Concerning Tables 2 and 3, it is interesting to note that, when the lo- 
cation parameters are unknown, the previous remark is no longer true. In 
fact, even if the previous comments on empirical standard errors are clearly 
relevant, it is worth noting that the smaller empirical bias is not obtained 
for the highly bumped model, but for the moderately bumped model. To 
explain this phenomenon, we can remark that when A = 0.15, there are few 
data to estimate fi±, whereas when A = 0.35, even if there are many more 
data to estimate fi±, this estimation is disturbed by the left tail of the dis- 
tribution centered on fj,2- Finally, it is with A = 0.25 that we obtain the best 
compromise and therefore the best estimates with regard to minimum bias. 
In addition, we observe that the performance of the maximum likelihood 
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approach (which is known to be asymptotically efficient) is in the range of 
those obtained by our method, which illustrates the good behavior of our 
semiparametric approach with respect to the parametric approach. 

A trimodal example. We use a basic symmetric density / which is already 
a mixture, that is, 

f(x) = \<p{x + 4) + \<p{x) + \<p{x - 4), 

where ip is the density function of the standard Gaussian distribution. The 
density of the simulated data is taken as 

g{x) = \f{x) + \f(x-A) VxGM. 

We performed the estimation on a simulated sample of size n = 100. The 
results are given in Figure 1. Figure 1(a) shows / superimposed with the 
true density function / and Figure 1(b) shows the reconstructed density 
function g(-) = A/(- — fx{) + (1 — A)/(- — fa), using estimated values of A, m 
and fj,2, superimposed with the true density function g. The optimization 
required 31 iterations and 45 evaluations of K r (-;G n ) and its gradient. 

Standard errors for Euclidean parameters are computed by the Jackknife 
method (see, e.g., [14]). We observe that for a reasonably small sample size 
n = 100 the reconstructed mixture density g almost yields the true density 
g. The main differences appear around local modes and in the tails of g. 

Numerical results on real data. We use the average amount of precipi- 
tation (rainfall) in inches for each of 70 United States (and Puerto Rican) 
cities (from the Statistical Abstract of the United States, 1975; see [30]). 
We consider two models. The first is model (2) in which we denote by A, 
fa, fa and / the estimators of A, /zi, ^2 and / (the density function of the 
c.d.f. F). The second model is a parametric version of model (2) in which 
we assume that / is the density function of a centered Gaussian distribution 
with variance equal to a 2 . Estimators of unknown parameters of the second 
model are denoted by A, fa, fa and a 2 , and calculated according to the 
maximum likelihood method. 

Figure 2(a) shows /, the estimator of / superimposed with the density 
function of J\F(0, a 2 ). Figure 2(b) shows g, the empirical estimate of g (ob- 
tained by the kernel method) superimposed with both the reconstructed 
density A/(- — fa) + (1 — A)/(- — fa) (using estimated values A, fa and fa of 
A, /ii and (12) and the density of the parametric model where the Euclidean 
parameter is replaced by its maximum likelihood estimator. The optimiza- 
tion required 32 iterations and 66 evaluations of K r (-;G n ) and its gradient. 

We observe in Figure 2(a) that the nonparametric density estimate / is 
provided with two symmetric small bumps at the beginnings of its tails, 
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while the best fitting Gaussian density does not obviously benefit from this 
kind of singularity and is sharper around the origin. In Figure 2(b) we can 
see that these two additional bumps make the difference in the good fit- 
ting behavior of the reconstructed mixing distribution, except in a small 
area around [—20,0] (the area of interest being [—20,75]), where the best 
fitting Gaussian mixture is slightly closer to g. Notice also that the smaller 
bump on the left of g is clearly detected by our method, while the best 
fitting Gaussian mixture almost misses this singularity. Again, standard er- 
rors (given in brackets) for Euclidean parameters are computed using the 
Jackknife method. 

5. Proofs. 
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Fig. 1. Estimated parameters fn = 0.691 (0.760), fi2 = 3.728 (0.153), A = 0.232 (0.079) 



for n — 100 and b„ 



-1/4 



the results in parentheses correspon 



to the empiri- 



cal standard errors, (a) Graph of f (solid) and graph of f (dashed), (b) Graph of 
A/(- — fii) + (1 — A)/(- — fa) (solid ) and graph of g (dashed ). 
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(a) 




(b) 



Fig. 2. Estimated parameters for model (2); /h = 13.107 (3.299), \i 2 = 39.056 (1.395), 
A = 0.171 (0.078) (the bandwidth is fixed at 3.84,). Estimated parameters for model 
A * Af(/ii,<7 2 ) + (1 - A) * A%2,cr 2 ): /Si = 15.715 (2.220), £2 = 40.773 (1.297), 
A = 0.235 (0.060) and a — 8.504 (1.187), the results in parentheses corresponding 
to the empirical standard errors, (a) Graph of the nonparametric density estima- 
tor f and graph of the density of Af(0,a 2 ) (dashed), (b) Graph of g (dashed), 
graph of A/(- — fix) + (1 — A)/(- — fa) (solid) and graph of the density of 
AA/"(/ii,5- 2 ) + (1 - \)Af(fj. 2 ,v 2 ) (dash-dot). 



5.1. Notation and preliminary results. According to whether we are look- 
ing at density function estimation or c.d.f. estimation, the operators Ag and 
Aq , given in (10), are defined, respectively, on spaces L X (]R) or L°°(R) (en- 
dowed with the usual norms || • ||i and || • ||oo, resp.). Independently of the 
space under consideration, it is straightforward to check that the norms (de- 
noted I • I) of operators A e and Aq 1 , for A G [0, 1/2 - d] and d £ (0, 1/2), 
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satisfy 

(2D |4| <1 and lllVlll^^A^^ 

Let us recall some basic results on G n and G n denned respectively by 
(12) and (14). From well-known results on empirical processes (see, e.g., 
[37]), for general distribution functions G, we have 

(22) V^I|G„-G|| 0O = O P (l), 

and the law of iterated logarithm (LIL) 



(23) \\G n — G||oo — O a 



'log log(n) 



n 



If ll/lloo < CO; and if / has derivative /W with ||/W||oo < oo, the same holds 
for g, and by Corollary 1, page 766 in [37], if q has compact support, and if 
\fnb\ = 0(1), then we have 

(24) v^||Gn-G n ||oo = O a . s .(l). 

Hence, the result (23) holds for G Tl . 

In the remainder of this paper we denote by L and L the first- and second- 
order derivatives of a general function L with respect to A £ [0, 1/2 — d] for 
problem (PI) and 6 G 9 = [0, 1/2 — d] x X for problem (P2) (see Section 
3.3 for assumptions on the Euclidean parameter space). In the sequel | • (2 
denotes the Euclidean norm. 

Lemma 5.1. There exists c £ (0, +00), such that for all 9 £ and n > 1, 
we have 

(25) WGjp-GeWn^cWGn-GWn. 

Proof. Straightforward, since we have 

\\G^ — Gtflloo = HAjSVf^g 1 (G n — G)]||oo 

— 111-^-0 III X ll^n — C||oo 



Lemma 5.2. Under CI, the mapping 0i— > Gg(x) is Lipschitz on O uni- 
formly in x £l, and the contrast function K is Lipschitz on 0. 
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PROOF. For all (9,9') G 6 2 , we have \K{9) - K{9')\ < C\\G e - GV|U- 
Therefore, it is sufficient to prove that 9 i— > Gq(-) is uniformly Lipschitz on 
0. Simple calculations lead to 

(26) \\G e - Ge'Woo < \\A e S r A^G - A^S^G^ + \\A^G - V G IU- 

First we prove that the first term on the right-hand side of (26) is Lipschitz. 
Let us remark now that, for all bounded functions H, we have, for all iGi, 

\A e H(x) - A e ,H(x)\ < 2|A - A'| x \\H\loo 

(27) + supmax|ff(x) — H(x — (/ij — /4))|. 

On the other hand, noticing rj = ]i2 — Li\ (resp. rj = ]x' 2 — fi'i), we remark that 
for all 9 G 9, Ag l G satisfies, for all (z, z') G R 2 , 

ISrA^Giz)- S r A^G(z')\ 

= \A7 1 G(-z)-A7 1 G(-z')\ 



1-A 



E 

k=0 



1- A 



(G(-z + li 2 + kr)) - G(-z' + ix 2 + kr])) 



< suv \G(y)-G(y-z + z>) 
2d , 



1 



< Yd \G\ Up \z-A 

because under CI G is Lipschitz with Lipschitz constant | C| li p . Now replac- 
ing H in (27) by SrA^G, it follows from the above inequality that 

(28) \A e S r A e l G(x) - Ae>S r Ag l G{x)\ < C\9 - 9%. 

It remains to be proved that the second term on the right-hand side of 
inequality (26) is Lipschitz. We have 

\A~ l G(x)-A^G(x)\ 

k 



< 



1 



1 - A 



E 



A 



+ 



k=0 

00 



1-A 



{G{x + Li 2 + kr]) - G(x + ll' 2 + kr]')) 



1-A 



E 

k=0 



1-A 



1-A 



7E 



A' 



1-A' 



fc=0 

x G(x + Li 2 + krj) 



G is supposed to be Lipschitz. We have, for all 

\G(x + ]x 2 + kr]) - G(x + li' 2 + krf)\ < \G\ Lip (k + 1)\9 - 9'\ 2 ; 
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thus, we obtain, by the two previous inequalities, 
\A^G{x)-A^G(x)\ 



? / |2 + C2]|GIUA-A / | 



fc=0 

<c 3 |0-0'| 2 , 

where c%, c 2 and C3 are nonnegative real constants. From inequalities (26)- 
(29), it follows that the function 6 1— > Gg(x) is Lipschitz on uniformly in 
and thus, if is Lipschitz on 0. □ 



Lemma 5.3. For any a > 0, under CI we Ziaue 
(30) 



sup |if (0; G n ) - if (0)| = o a .s.(n- 1/2+Q ). 

6»G0 



The same result holds replacing K(-;G n ) by K T (-;G n ). It is an obvious 
consequence of properties of G n , since by the LIL result (23) for G n and 
(24), we have \\G n - G\\oo = O a . s .(^J n^loglogn). 

Proof of Lemma 5.3. Considering the random variables = 
(Gg(Xi) — G{Xi)) 2 and using Lemma 5.1, we show that 



\K (9; G n ) - K {6)\ < c\\G n - G|U + sup 

6»ee 



1 



Y,{zm-E{zm)) 



i=i 



where c is a nonnegative constant. The two terms on the right-hand sides no 
longer depend on 9. The first tends to with the desired rate of convergence 
by the LIL result given in (23). The second term is the supremum of an 
empirical process indexed by the functional class TL = {h(-,9) = (Gg(-) — 
G(-)) 2 ,9 £ 0} of Lipschitz bounded functions. Indeed, we have, by Lemma 
5.2, 

\h(x,9)-h(x,9')\<\G e (x) + G el (x)-2G e (x)\ x \G e (x) - G v (x)\ 
<c\9-9'\ 2 . 

Let (e n )n>i be a sequence of real numbers decreasing to 0. It follows by a 
Bernstein type theorem of van der Vaart and Wellner ([41], page 246) that 
there exist nonnegative constants A and B such that 



PI sup 

\0eG 



i=l 



>£ n I < A(^/ne n ) B exp(-2ne' n ). 



It follows that if e n 
lemma, 



n 



-1/2+a -^th a > 0, we get, by the Borel-Cantelli 



sup 



1 



n 



"£(Zi{e)-EZi(9)) 



i=l 



-l/2+a\ 
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which concludes the proof. □ 

Lemma 5.4. Under CI, for k = 1,2, there exists a real constant c > 
such that, for all (Ai, A 2 ) G [0, 1/2 - d] 2 and L G 



9 fc 



A=Ai 



A=A 2 



< c Ai — A 2 x L 



A straightforward consequence of the above lemma is that, for k = 1,2 
and L 6 L°° (R) , there exists another real constant c > such that 

:A x S r A^L 



(31) 



dX k ' 



< c\\L\ 



Proof of Lemma 5.4. We prove the uniform Lipschitz property only 
for the case where k = 1 , since the case where k = 2 uses the same technical 
arguments. For all (Ai, A 2 ) G [0, 1/2 - d] 2 , L G L°°(R), and all x G R, we have 



9A 



AxSrAT 1 



A=Ai 



L(x) 



9A 



A x S r A~ x l 



L(x) 



A=A 2 



< 



(1-AO 2 (1-A 2 ) 



(32) 



fe>o VI -Ai 



x |L(-x + /xi + [i 2 + &«) - L(-x + 2/i 2 + (A; + 1)77) | 



+ 



(1-A 2 ) 2 



fe>0 



A, 



1-Ai 



-A 2 



1-A 5 



x |L(-x + + // 2 + A:r?) - L(-x + 2/z 2 + (fc + l)rf)\. 

By the mean value theorem, there exist A and A lying on the line segment 
with extremities Ai and A 2 such that 



1 



1 



(1-Ax) 2 (1-A 2 ) 2 



< 



(1 - A)' 



and for all k > 0, 



X, ^ 



1 - A 



-A, n k 



1-A 5 



<k 



1-A 



fc-i 



Using the above inequalities with (32), we obtain 



L 



A=Ai 

which concludes the proof. □ 



dX 



L 



A=A 2 



|Ai — A 2 |, 

12] L] oo Ai — A 2 | 



< 



d 3 
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5.2. Proof of Theorem 2.1. 

S'iep 1. Let {sin(aii), . . . , sin(a p i)} be a family of p functions defined on 
R. These functions are linearly independent if and only if we have 

(33) OLi forl<i<p and \a,i\ ^ \atj\ for 1 <i<j<p. 

Indeed, suppose that for fa, . . . ,/3 p in R we have 

p 

^Asin^i) = VieR. 

i=l 

Then, taking the derivative of the above expression with respect to t at 
orders 1,3,..., 2p — 1, we get at t = the system of linear equations 
p 

J2 Pia 2 i j+1 = for < j < p - 1. 

i=l 

The corresponding determinant is a Vandermonde type determinant differ- 
ent from if and only if (33) is satisfied. 

Step 2. We denote by <3> and the characteristic functions of F and F', 
respectively. Calculating the characteristic function of the two sides in (5), 
we get, for all t £ R, 

, . (Aexp(it/ii) + (1 - A) exp(it// 2 ))$(£) 

1 j = (A'exp(Vi) + (1 " \')exp(it(i' 2 ))<f>Xt). 

Since F and i 7 ' are c.d.f.'s of symmetric distributions, their characteristic 
functions are real continuous functions equal to 1 at t = 0. We have from 

(34) that the imaginary part of 

(Aexp(ityti) + (1 — A) exp(it// 2 ))(A exp(— itfj,^ + (1 — A')exp(— itfj, 2 )) 
is equal to in a neighborhood of 0. Then we have 
, s AA'sin((/xi - ft)t) + A(l - A')sin((/ii - // 2 )t) 

1 j + (1 - A)A / sin((/i 2 - + (1 - A)(l - A') sin((^ 2 - // 2 )t) = 

on the whole real line, by analyticity of sine functions. We shall now consider 
two cases. 

Case 1: A = 0. Then (35) reduces to 

(36) A' sin(Gu 2 - Mi)*) + ( x " A sill ((w - f4)t) = 0. 

If A' > 0, then we have 1 — A' > A' > 0, and by step 1 we need to consider 

the following cases: 

• H2 = M2 or ^2 = Mi; hence by (36) fj,' 2 = (J>i, which is not admissible. 

• \fi 2 = = |/^2 = 1 , which by (36) leads to A' + (1 — A') = (impossible) 
or A' — (1 — A') = (not admissible). 

It follows that A' = A = and, hence, by (35) // 2 = 
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Case 2: A > 0. From Case 1, we also have A' > 0. Therefore, it remains 
to show that if fi\ / ^i 2 , Hi / /4> £ (0; 1/2) 2 and that, for all t G R, 

, s AA'sin((^i - fj,[)t) + A(l - A') sin((/xi - // 2 )i) 

1 j +(1 - A)A'sin((/i 2 - + (1 - A)(l - A') sin((/i 2 - // 2 )t) = 0, 

we have (A,^i,/i 2 ) = (A', // l5 /x 2 ). If we denote /3i = AA', /3 2 = A(l — A'), (3% = 
A'(l — A) and /?4 = (1 — A)(l — A'), then (37) is equivalent to 



where a = — fi'i, a' = /U 2 — y! 2 and r/ = /i 2 — A*i ■ It is straightforward to see 
that if q = a' = 0, then A = A'. Then it remains to show that (a, a') = (0,0) 
is not admissible. To avoid a lengthy proof, we consider only the case a = 
and cJ 7^ 0. The case a/0 and a' = is its symmetric counterpart and the 
case and c/ 7^ involves substantial calculations but is straightforward. 
Hence, if we suppose that a = and a' 7^ 0, equation (38) reduces to 

(39) /J 2 sin((Q'-r ? )t) + ^ 3 sin(?7t)+/J4sin(a / t)=0 Vt G R. 

Since a' and 7/ are nonnull, by Step 1, we have to consider the following 



• ol =r\: hence, (P3 + P4) sin(7yi) = for all t G R. Then P3 + /?4 = 0, which 
is not possible. 

• \a' — rj\ = \r)\: hence, a' = 2r/. Then (39) reduces to 



which, again by Step 1, cannot be satisfied for all t G R. 
• Cases \a' — rj\ = \a'\ and | | = \a'\ lead respectively to a' = 77/2 and r/ = 
—a', hence, as in the previous case, the resulting equations cannot be 
satisfied for all t £ R. 

Step 3. Now, since A G [0, 1/2) we have |Aexp(«t/ii) + (1 — A) exp(it/i 2 )| > 
1 — 2A. Then $ = $' and, finally, F and F' are equal. 

5.3. Proofs of Theorem 3.1 and Corollary 3.1. 

Proof of Theorem 3.1. Let us write for the characteristic func- 
tion defined by <&#(£) = J R exp(itx) dH(x) for all t G R. Using the definitions 
of Aq and Ag in (10), we obtain 



(38) 



Pi sin(at) + 02 sin((c/ — rj)t) 

+ As sin ((a + r])t)+ /3 4 sin(a'i) = V t G R, 



cases: 



0% + /9s) sinfat) + p 4 sin(2r7*) = 



VtGR 



(40) $ Gfl (t) 



Aexp(it/zi) + (1 — A) exp(it/i 2 ) 



Vt GR. 



Aexp(— itfj,\) + (1 — A) exp(— it/i 2 ) 
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Moreover, because 3>g(*) = (Aoexp(zt//j ) ) + (l — Ao)exp(zt//2))3>F (*) and $j? 
is an even function, <E>G e (i) = ^g(^) f° r all t £ K implies that the imaginary 
part of 

(Aexp(ii/ii) + (1 - A)exp(it/Li 2 ))(Aoexp(-it/i5) + (1 - A ) exp(— itfi®)) 

is null in a neighborhood of 0. Finally, by Step 2 of the proof of Theorem 
2.1, we conclude that 9 = 9q. □ 

Proof of Corollary 3.1. Given the assumptions concerning Fq, we 
show that Gg is a continuous function. By Theorem 3.1, if 9 ^ 6q, there 
exists xq G R such that G{xq) ^ Gg(xo), and there exist e > and a > 
such that — Gg(x)\ > e on [xo — a,xo + a]. It follows that 

if (0) > e 2 r° + " = e 2 (G(x + a) - G{x - a)) > 0. 

Otherwise, if 9 = 9q it is straightforward to check that K(9) = 0. □ 



5.4. Proof of Theorem 3.2. Since the consistency proof for A n follows 
the lines of the consistency proof for 9 n of problem (P2), it is omitted. For 
the remainder of this proof, we therefore suppose that A n converges almost 
surely to Ao- By a first-order Taylor expansion of K(-;G n ) around A n , we 
have 



(41) 



K(\* n , G n )V^(K - A ) = -VEk(X ;G n ), 



where A* lies on the line segment with extremities Ao and A n . The desired 
result follows by proving the two statements 

v^if(A ;G'„) = Op(l) 



(42) 
and 

(43) K(\* n ;G 
Result (42) follows from 



G 2 Xo dG > 0. 



\K(\ ;G n ) 



» 



2G Xo (x)(G^(x)-G n (x))dG n (x) 



<2\\G 



Ao 



G r , 



x||G (n) || 

X H^Ao lloo 



<2\\A Xo S r A^[G n -G}\\ 



§x AxSrA ^ 



G, 



A=A 



The above inequality with Lemma 5.1 and (31) give the existence of a non- 
negative constant c such that \K (Ao;G n )| < c||C7 n — G||oo. Thus, from result 
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(22), we get (42). In order to prove (43), let us write the second derivative 
of K(-;G n ) at point A, 



K(X;G n ) = 2 I Gl l \Gl l > -G n )dG n + 2 I {t^f dG n 



We have 
(44) 



K(\* n ,G n )-2 l_Gl dG 



K(Xq; G Ti 



G\ dG 



<\K(X* n ,G n )-K(X ;G n )\ + 

By very simple calculations, we show that 

\K(X* n ,G n )-K(X ;G n )\ <c|A* -A |, 

where c is a nonnegative constant arising from Lemma 5.4, (31) and the 
fact that G n is a cumulative distribution function. By the above inequality 
and the strong consistency of A n , we conclude that iT(A*;G n ) — K(Xo;G n ) 
converges almost surely to 0. 

Concerning the second term of the right-hand side of (44) , let us write 



K(X ;G n )-2 / G\ dG 



<2 



G Xo dG r , 



G\dG 



+ 2IIG 



Mm 

Aq I loo 



x IIG 



(n) 
Ao 



G T 



+ 2(||Gi n o ) || 00 + ||G Ao | 



oo) X II^Ac 



G\ \ 



Let us investigate the three terms on the right-hand side of the above in- 
equality. From Lemmas 5.1 and 5.4, the second term is bounded, up to a 
multiplicative nonnegative constant, by 



\G 



(n) 



x \\G n ~ G\ 



r \ ~ CaqIIoo + \\G n - GaoIIoo) < (1 + \\\A Xo S r A Xo 

and then, tends to almost surely, by using (21) and the LIL result given in 
(23). By similar arguments, we show that the third term has the same prop- 
erty. The first term is a centered empirical mean of i.i.d. random variables 
which, by Lemma 5.4, have a finite mean. Therefore, this term converges 
almost surely to by the strong law of large numbers. Thus, it follows that 



K(X ;G n )-2 / G\ dG 



.(I)- 



We conclude the proof, noticing that K{Xq) > [the proof, under CI, is sim- 
ilar to the proof of positive definitiveness of K{9q) in Section 5.5; therefore, 
it is omitted], and then 



K(X ) 



G 2 Xo dG>0. 
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5.5. Proof of Theorem 3.3. 

Proof of the consistency. Our method is based on a consistency proof 
for minimum contrast estimators by Dacunha-Castelle and Dufio ( [9] , pages 
94-96). Let us consider a countable dense set D in G. Then infg G e K r (9; G n ) = 
inf S D K r (9; G n ) is a measurable random variable. We define, in addition, 
the random variable 

W(n,0 = sup{\K r (9; G n ) - K r {9'; G n )\;(9, 8') e D 2 , \9 - 9'\ 2 < £}, 

and recall that K(9q) = 0. Let us consider a nonempty open ball Bq centered 
on 9q such that K is bounded from below by a positive real number 2e on 
Q\Bq. Let us consider a sequence (£ p ) p >i decreasing to zero, and take p 
such that there exists a covering of Q\Bq by a finite number t of balls 
(-Bj)i<j<^ with centers 0j G 0, i = 1, . . . ,£, and radius less than £ p . Following 
Dacunha-Castelle and Dufio [9], we have 

limsup{# n ^ Bq} C lim sup{ VF(n,^ p ) > e} 

/, r \ n n 

(45) r A i 

Ulimsup inf (K r (9i;G n ) - K r (9 ;G n )) < e\. 

n U<*<^ J 

By the uniform convergence result of Lemma 5.3, we have 

(46) pflimsupj inf (K r (9i; G n ) - K r (9 ; G n )) < e\) = 0. 

V n U<«<« J / 

Because K is Lipschitz on O by Lemma 5.2, we have that, for sufficiently 
large p', \K{9) - K{9')\ < e/2 for all {9,9') such that \9 - 9% < £ p >. This 
implies 

limsup{VF(n,£ p ') > e} 

n 

C limsup(2sup 1/^(0; G n ) - fl"(0)| + |iT(6>) - K{9')\ > el 

C Umsup|2sup|i<r r (0;G n ) - K(6)\ >s/2\, 
and by Lemma 5.3 we have 

pflimsup{2sup|K r (0;G n ) -K{9)\ > e/2)) =0, 
which leads to 

(47) p(lims\ip{W(n,t p i) >e}) =0. 

By (45)-(47), we have proved the strong consistency of the contrast estima- 
tor 9 n . 
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Proof for the convergence rate. By standard Lebesgue theory, it is straight- 
forward to show that, under C2, the contrast function K is twice continu- 
ously differentiable on 0. If K(0q) is positive definite, by Corollary 3.1 and 
a Taylor expansion of K of order 2 at 6o, there exist n > and a > such 

o 

that, for all u satisfying \u\i < r\ and 6>o + u G0, 

(48) K(e + u) >a\u\ 2 2 . 
For a column vector v = (v\, v 2 , v^) T G R 3 , we have 

(49) v T K(6 )v = 2 [ (v T Gg (x)) 2 dG(x) > 0. 

By C2, we obtain that x i— > Gg (x) is continuous and that G is continuous 
and strictly increasing on R. Thus, (49) implies that x \— > v T Gg (x) is the 
null function if v T K(0q)v = 0. Because under C2 we have /g G L x (R), it is 
easy to show that v T gg (-) G L 1 (R), where = AgSfAg 1 g. Moreover, using 
the Lebesgue derivation theorem and (40), and denoting f/o = A*2 — Mi) we 
obtain 

%ra go (t) = v T ^ Geo (t) 



" (A exp(-it/z?) + (1 - A ) exp(-^)) 2 

x [cos(7? i)(ui(l - 2A ) + it(u2(l - A ) + U3A0)) 

+ it{v 2 \ + v 3 (l- \ ))}. 

Therefore, v 1 K{0q)v = implies that $ v tq s is the null function. Because 

&g(— t)/(^o ex P(~ it/A) + (1 — ^0) ex P( — it^®)) 2 1S n °t nvm i n a neighborhood 
of 0, we obtain that the right multiplicative term of the right-hand side of 
the above equality is null in a neighborhood of 0, which in turn implies that 
v = 0. Thus K(6q) is positive definite and (48) holds. 

Now, let us consider Bo(r} n ), the open ball centered at #0 with radius 
rj n > 0. Notice that, for all 9 G \ B (n n ), we have \0 — 0q\ 2 > Tj n . Then we 
write the event inclusions 

{9n^B (r] n )}c\ inf K r (9; G n ) < K r (9 ; G n )\ 

c{ inf K(8)-sup\K r (6;G n )-K(6)\<K r (8 ;G n )\ 

c{ inf #(0)<2 S up|tf r (0;G n )-tf(0)|} 

c{ inf K(e)<7 n lu(7n<2sup|Er r (0;G n )-^(^)|) 



SEMIPARAMETRIC MIXTURE MODEL 27 
for any arbitrary sequence j n . Thus, we have 

limsup{# n B (rj n )} C limsup<^ inf K(9) < -y n \ 

n n [8ee\B (rj n ) J 

UlimsupKn < 2sup\K r (6;G n ) - K(6)\ \. 

Choosing now j n = n ~ l / 2+a an d r\ n = ra^ 1 / 44 ^/ 2 , with < a < (3 taken ar- 
bitrarily small, it follows from (48) and the uniform almost sure rate of 
convergence of K r (G n ) toward K, given in Lemma 5.3, that 

p( limsupl inf K (9) < ln \] = 
V n {eee\B ( Vn ) )) 



and 



'^limsup| 



7n < 2sup|if r (0;G n ) - K(0)\\) = 0. 
eee J / 



In conclusion, 9 n converges almost surely toward 9q at rate n 1 / 4 + <5 i with 
5 > chosen arbitrarily small. 



5.6. Proof of Theorem 3.4. 

Proof of (i) and (ii) . We have 

F n -F = \{I + S r )[A£6 n - A^G}. 
Thus, there exists a nonnegative real constant c such that 

114 - F |U < \\A7\G n - G)^ + \\{A- § 1 - A-^GW^ 

< lll^r 1 ! x \\G n - GWoo + c\9 n - 9 \ 2 

< ^H^ra ~~ G\\oo + c\9 n — 9q\2, 

where the second inequality follows from (29) in the proof of Lemma 5.2 
and the last inequality follows from (21), using the fact that G is Lipschitz. 
Finally, the above inequality together with (22) [resp. (23)] and Theorem 
3.2 (resp. Theorem 3.3) yield result (i) [resp. result (ii)]. 

Proof of (hi). By the Devroye [12] L 1 -consistency result, we have 
(50) \\g n -g\\ 1 = \g n {x) -g{x)\dx -^0 
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as n — > +00, providing that b n — > and nb n — > +00. Then we can write 

Wfn ~ /olll = \\ A §*9n ~ A 9o9\\l 

<\\A7\g n -g)^ + \\( A 7l-Aj o 1 )g\\ 1 

< ^i\\9n - g\\i + C\9 n - e \ 2 , 

2a 

where the last inequality comes from (29), because /q £ and, thus, 

the same holds for g. We conclude with Theorems 3.2 and 3.3, and (50). 
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